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Understanding the behaviour of biological systems requires a complex setting of in vitro and in vivo 
experiments, which attracts high costs in terms of time and resources. The use of mathematical mod- 
els allows researchers to perform computerised simulations of biological systems, which are called 
in silico experiments, to attain important insights and predictions about the system behaviour with a 
considerably lower cost. Computer visualisation is an important part of this approach, since it pro- 
vides a realistic representation of the system behaviour. We define a formal methodology to model 
biological systems using different levels of representation: a purely formal representation, which we 
call molecular level, models the biochemical dynamics of the system; visualisation-oriented repre- 
sentations, which we call visual levels, provide views of the biological system at a higher level of 
organisation and are equipped with the necessary spatial information to generate the appropriate vi- 
sualisation. We choose Spatial CLS, a formal language belonging to the class of Calculi of Looping 
Sequences, as the formalism for modelling all representation levels. We illustrate our approach using 
the budding yeast cell cycle as a case study. 

1 Introduction 

The high complexity of biological systems has encouraged during the last decades extensive inter- 
disciplinary research comprising biology and other fields of science such as computer science, math- 
ematics, physics and chemistry. This interdisciplinary approach to biology resulted in a new field of 
study, called systems biology, which focuses on the systematic study of complex interactions in biologi- 
cal systems. 

Computer scientists find many interesting similarities between systems biology and theory of con- 
currency. Degano and Priami [14] claim that both systems biology and formal methods for concurrency 
can cross-fertilize each other. Being based on sound and deep mathematics, concurrency theories may 
offer solid ways to describe biological systems and safely reason upon them. On the other hand, systems 
biology studies many complex biological phenomena. Modelling and reasoning about these complex 
phenomena may require techniques that are more efficient and reliable than existing techniques. It is 
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expected that the effort to understand biological mechanisms in terms of computer technology will pos- 
sibly lead to new techniques that are more robust, efficient and reliable to model and analyse complex 
systems. 

Many mathematical formalisms have been proposed to model and analyse systems biology. Some of 
them are based on formalisms generally used for modelling concurrent systems, such as Petri Nets [31, 
32, 19], the 7r-Calculus [10, 30, 36, 37], and CCS (Calculus of Communicating Systems) [12]. Other 
formalisms are inspired by biological phenomena, such as compartmentalisation: Brane Calculi [7, 13], 
P Systems [28, 27], Calculi of Looping Sequences [2, 3], and Biocham [6, 8]. In particular, Calculi 
of Looping Sequences is a class of formalisms: Calculus of Looping Sequences (CLS) is the basic 
formalism [2] and has two important extensions, Stochastic CLS [2] in which reactions are associated 
with rates, and Spatial CLS [3], which includes spatial information. 

These formalisms support the analysis of biological systems using tools based on numerical sim- 
ulation, stochastic simulation and model-checking techniques. Bianco and Castellini developed PSim, 
a simulator for Metabolic P Systems, a variant of P Systems [5]. Scatena developed a simulator for 
Stochastic CLS [33]. Biocham is equipped with a tool for the simulation and model-checking of bio- 
logical systems. The probabilistic model-checker PRISM has also been used to analyse some biological 
properties [21, 23, 22]. All these tools provide textual output as well as a plot of the simulation. 

Texts and plots provide very detailed information on specific aspects of the analysed biological sys- 
tem. However, they are often inadequate when the aim is to acquire global knowledge about the high- 
level organisation and dynamics of the biological system. For example, in most experiments the analyst 
can only vary molecular concentrations in the environment and within cells, whereas the aim of an ex- 
periment or simulation may be to observe the resultant behaviour of cells or even the whole organ or 
organism. Such high-level behaviours can be better described through two or three dimensional visuali- 
sation/animation rather than using texts and plots. 

One approach for modelling and visualising biological systems is based on the use of L-systems. L 
Systems use rewriting systems for modelling biological processes [16]. However, they are used mainly 
for visualising the development of plants [18, 29, 15]. Hammel and Prusinkiewicz model the behaviour 
of Anabaena at cellular level [18] by considering interactions between cells with external factors in the 
environment. 

Michel, Spicher and Giavitto use rule-based programming language MGS to model and simulate the 
A phage genetic switch [25]. They present a multilevel model of the system; a molecular level defined 
using built-in Gillespie's algorithm and a population of cells level defined using GBF (Group Based 
Field) and Delaunay topological collections. 

David Harel and his group developed an approach in modelling at different levels of representa- 
tion [20]. They use object oriented approach and define the cell as the basic building block of their ap- 
proach. Their approach uses scenario to define system behaviour and uses animation on a 2-dimensional 
grid [1]. Scenarios define cell behaviour related with interactions between molecules in the environment 
and their receptors on cell membranes. Another interesting application of their approach is the mod- 
elling of pancreatic organogenesis [34]. In this application they show how molecular interactions affect 
cell growth and, in the end, affect the growth of mammalian pancreas. A three dimensional visualisation 
is used to visualise the pancreatic organogenesis process. 

Another tool used to visualise biological systems according to a model of the system at the molecular 
level is Virtual Cell [35]. This tool is based on a deterministic numerical simulation of the model, which 
is defined using differential equations. 

Technology has helped biologists to observe biological systems at microscopic level, down to the 
molecular level. More knowledge has been gained about cell structure and behaviour. Although it is 
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possible to track the causes of certain phenomena at cellular level down to specific biochemical reaction 
occurring at molecular level, we are still far from being able to entirely explain cell behaviour in terms of 
the biochemical reactions occurring within the cell and its environment. Moreover, based on natural ob- 
servation and experiments, it is also possible to give an accurate description of all phenomena observable 
at cellular level. 

We define an approach to model biological systems at different levels of representation. We con- 
sider the molecular level, in which organic molecules interact through biochemical reactions, as the 
lowest level of our hierarchy. At this level, the representation is merely a mathematical formalisation 
of biochemical reactions, with no visualisation. The representation of higher levels of organisation and 
dynamics such as a cell, a tissue, an organ or an organism, is inspired by observations of the system 
behaviour under normal conditions. At these levels the mathematical formalisation mimics organisation 
and dynamics observed in nature and replicated in controlled experiments, with the aim to give a visu- 
alisation, possibly in three dimensions, of the modelled phenomena. We refer to these higher levels as 
visual levels, in contrast to the molecular level, for which we do not provide any visualisation. Each 
visual level has to be linked to the molecular level by a formal description of the way biochemical re- 
actions cause transitions of visual states. However such causal relations are not always fully understood 
in terms of biological theories. Therefore, transitions of visual states may sometimes be governed by 
rates observed in nature rather than by the underlying biochemistry modelled at molecular level or may 
be directly associated with the introduction or removal of biochemical signals, whose accumulation or 
degradation is not sufficiently explained in terms of biological theories. 

This approach allows us to introduce changes to cells and their environment at the molecular level 
and observe the impact of such changes at the visual level. Changes at the molecular level may range 
from varying the concentration of a specific molecule or introducing a new kind of molecule with specific 
properties to the introduction of a plasmid within the cell or a virus in its environment. 

The work of David Harel and his group [20, 1] and the work of Hammel and Prusinkiewicz [18] 
have similarity with our approach, in the sense that they also represent both molecular and cellular level. 
However their approaches consider molecules only as external factors that affect cells behaviour. The 
cellular level is modelled only in the environment and on the cell membrane, but not inside the cell. 
Moreover in David Harel 's approach, all rules are deterministic and randomness can only be introduced 
by defining random initial states of the system. In contrast, in our approach, we model and simulate 
stochastic behaviour by introducing reaction rates and controlling the occurrences of reactions using 
Gillespie's algorithm. 

Slepchenko, Schaff, Macara and Loew take an approach closer to ours by developing a tool to model 
biological systems at molecular level and visualise them at cellular level [35]. Their approach, how- 
ever, is based on differential equations, which are deterministically simulated using numerical simulation 
methods. 

In this paper we use Spatial CLS to model both the molecular and visual level. We consider the 
cellular level as visual level and we use the cell cycle of budding yeast as a case study. However, spatial 
information is only introduced at the visual level, while the molecular level is modelled using a subset of 
Spatial CLS with no spatial information, which is equivalent to Stochastic CLS. 

The simulation is performed using a variant of Gillespie's algorithm which extends the variant in- 
troduced by Basuki, Cerone and Milazzo [4] with the addition of a mechanism for choosing individual 
cells, which is governed by exponential distribution of reaction time. 
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2 Using Spatial CLS for Visualisation 

Calculi of Looping Sequences is a class of formalisms developed at the University of Pisa for the purpose 
of modelling biological systems [26, 2]. In this section we focus on one variant of the calculi called 
Spatial CLS [3] , which supports the modelling of details about position and movement of the components 
in the system. 

A Spatial CLS model contains a term and a set of rewrite rules. The term describes the initial state 
of the system, and the rewrite rules describe the events which may cause the system to evolve. We start 
by defining the syntax of terms. We assume a possibly infinite alphabet S of symbols ranged over by 
a,b,c,... and a set ^ of names of movement functions. Each symbol represents an atomic component 
of the system. 

Definition 1 Terms T, Branes B and Sequences S are given by the following grammar: 

T ::= X | {S) d \ (B d ) L \T | T\T 



B 

S 



= A ! (S) d B | B 
= e a S-S 



where a is a generic element of S, e represents the empty sequence, and d G Q> = ( (W 1 x U { . } ) x M + . 
We denote with 2f, and 5? the infinite set of terms, branes and sequences. 

There are four operators in the formalism. A sequencing operator _ • _ is used to compose some 
components of the system in a structure that has the properties of a sequence. For instance, sequencing 
can be used to model DNA/RNA strands or proteins. The looping operator (_) L and containment operator 

_ J _ are always applied together, hence they can be considered as a single binary operator (_) L J _ to be 
applied to one brane and one term. Looping and containment allow the modelling of membranes and 
their contents. Finally the parallel composition operator | is used to compose juxtaposition of entities 
in the system. Brackets can be used to indicate the order of application of the operators, and we assume 
(_) L J _ to have precedence over _ | _. 

In Spatial CLS there are two kinds of terms, positional and non-positional terms. Positional terms 
have spatial information, while non-positional terms do not. A term with a '.' in its spatial information 
represents a non-positional term. 

Every positional term is assumed to occupy a space, modelled as a sphere. The spatial information of 
a term contains three parts: the position of the centre of the sphere, its radius and a movement function. 
In this way every object is assumed to have an autonomous movement. Variable d in the above syntax 
models such spatial information. In this paper we only consider terms without autonomous movement. 
Therefore, we can omit movement functions from our spatial information. 

We now define patterns, which are terms enriched with variables. We assume a set of position 
variables PV, a set of symbol variables 3£ and a set of sequence variables SV. We denote by T the set 
of all instantiation functions t : PV — ► & for the position variables. 

Definition 2 Left Brane Patterns BPi, Sequence Patterns SP and Right Brane Patterns BPr are given by 
the following grammar: 



BP L 
BPr 
SP 

where u G PV, x G 3£, x £ SV and g £ T 



(SP) U BP L | BP L 
{SP) g BP R I BPr 
e a SPSP \ x \ x 
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Definition 3 Left Patterns Pi and Right Patterns Pr are given by the following grammar: 

Pl ::= {SP) U | (BPlx) L u \Plx I Pl\Pl 
BP LX ::= BP L \ BP L \X \ X 
Plx ::= Pl \ Pl\X 
Pr ::= e \ {SP) g \ (BP RX ) L g \ Pr \ Pr\Pr \ X \ X 
BPrx ■'•= BPr BPr\X X 

where u G PV, x 6 JT, x G SV and g£T. 

We denote the sets of all left patterns by g?i, the set of all right patterns by 3?r. We denote by Var(f ) 
the set of all variables appearing in a pattern P, including position variables from PV. 

Definition 4 A rewrite rule is a 4-tuple (f c ,Pi,PR,k), usually written as 

where f c : T — ► {tt,ff}, k £ R + , Var(PR) C Var(Pi), and each function g appearing in Pr refers only to 
position variables in Var(Pi). 

Rewrite rules are used to define reactions that may occur in a system. The left and right patterns of a 
rule is matched against the term that represents the current state of the system, to check its applicability. 
It is possible to define a rewrite rule that has a precondition defined by f c . A precondition is checked 
against any term matches in the left pattern of a rule. The rate constant k models the propensity of a 
reaction. The value Ilk represents the expected duration of a reaction involving reactant combinations. 

Barbuti, Maggiolo-Schettini, Milazzo and Pardini [3] define a semantics for Spatial CLS, as a Proba- 
bilistic Transition System. In this semantics, they consider each evolution step of a biological system as 
composed of two phases: (1) application of at most one reaction; (2) updating of positions according to 
the movement functions. In this paper we omit movement functions, simplifying the evolution step into 
one phase only. 

The combination of looping and containment operators and parallel composition operators in Spatial 
CLS defines a notion of layers in the terms. The parallel composition operators model the objects as 
multisets, while the looping and containment operators create boundaries between these multisets. Ob- 
jects within the same boundary are considered to be in the same layer. We need to define some functions 
in order to calculate the rate of a rule application. In the following definitions, we denote the multiset 
of top-level elements appearing in a pattern P by P, and assume the function n : x —> N such that 
n(7\,r2) gives the number of times T\ appears at top-layer in T2. We define T as instantiation function 
for position variables and a as instantiation function for other variables. 



comb{P L \ I P L 2,T,0 
comb{(BP LX ) L u \P LX ,T,c 
comb(SP u , T, a 

combiPi I U,Z,(J 

comb'iP^ T, a 
binom(Ti,T2,TT, 



= n 



comb(Pu , T, o).comb{Pi2, T, d) 
comb'{BP LX ,Z, o).comb'(P LX ,T, a) 
1 

/ n((PL\U)TO,T 
TeP L ra^ n (p LTG ,T) 

comb(Pi, T, a) 

tt tj^TsJ) n(T 2 ,T)+i 
1 Irer! ll != i 



.comb(P L ,r,o), UGBVUTV 



R,T,c 



n(T 2 ,T)-n(T u T)+i 

Given a finite set of rewrite rules let C R be the set of all brane rules and let with R G . 
T e 3~ and c G N, be the least labeled transition relation on terms satisfying inference rules in Figure 1 . 



56 



Modelling Cell Cycle in Different Levels of Abstraction 



(R:[f c ]p L ^p R )eM f c (r)=tt t £ T q g L 

R,P L T(y,comb(P L ./c,a) 



w; j n ^fe « j r, w; j * ±fe « J r, 



, R,T,c.binom{T,T u T 2 ) , . 

T\ | r 2 ► r/ 1 t 2 



Figure 1 : Inference rules used for calculating rates of rewrite rules 



The following definition gives all the reactions enabled in a state, by also taking into account the 
subsequent rearrangement: 

Appl(R,T) = (Tr,c,T',T")\T ^ T' AT" = Arrange^ 1 ) ^± 

( R) 

The number m T of different reactant combinations enabled in state T, for a reaction R, and the total 
number mj of reactions considering a set of rules are defined as: 

(R) \ 

(Tr,c,T',T")eAppl{R,T) 

and 

L(R) 
m T 

ReM 

Let T describe the state of the system at a certain step, and Icr denote the rate associated with a 
rewrite rule R. At each step of the evolution of the system, in order to assume that at most one reaction 

(R) 

can occur, we have to choose a time interval 8t such that (£ R G 0£kRtrv T ')8t < 1. Given a set of rewrite 
rules M, we choose an arbitrary value N such that for each rule R 6 M it holds < kR/N < 1. Then we 
compute the time interval for a step as St = 1 /Nmj, thus satisfying the above condition. The value of ./V 
also determines the maximum permitted length of each step as 1 /N time units. 
The probability that no reaction happens in the time interval 8t is: 

Re.<% (Tr,c,T',T")eAppl(R,T) v T 

and the probability P(T\ — > Tz,t) of reaching state T 2 from T\ within a time interval St after t is such that: 

4i pv.r.r^ftro^ I otherwise 



The semantics of Spatial CLS is defined as Probabilistic Transition System as follows. 
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Definition 5 Given a finite set of rewrite rules ffl, the semantics of Spatial CLS is the least relation 
satisfying the following inference rules: 

{T r ,c,T',T 2 )€Appl(R,Ti) fief 
p = P{T l ^T 2 ,t) St = 



(T u t)^(T 2 ,t + 8t) 



p = P(T->T',t) 8t 



Nmr t 



N max( 1 .mj { ) 

(T,t)^{T',t + St) 

Application of a reaction may still result in non well-formed terms (e.g. a term that contains collision 
of two objects). To solve this problem, we define the algorithm described in Section 4 to rearrange objects 
in the system. In this section we describe some assumptions about space and spatial information used in 
our approach. 

First, we limit the direction where any object in the system can move. We assume that the space 
occupied by each object in the system is in W 1 and each object can only move along one axis at one time. 
For every axis, there are two possible directions. So in total there are 2n possible directions. Therefore 
in M 3 any object can move along 6 possible directions. 

Positional terms in Spatial CLS have size, which is defined as the radius of the sphere that encloses 
the term. It is possible to define reactions that modify this size. In our algorithm, we assume that 
the maximum sizes of all objects in the same layers are known. We assume the existence of an n- 
dimensional grid, which divides the space occupied by a biological system into cubes (we are working 
in n=3 dimensions) with fixed size defined in such a way that every object in the system fits inside it. 

Initially objects of a biological system are positioned inside cubes. Reactions defined for a biological 
system can create new objects, move existing objects to different positions or remove objects from the 
system. New objects are also positioned inside cubes. Therefore, the only time we need rearrangement is 
when two objects are inside the same cube. In this case, we will need to run the rearrangement algorithm 
explained in Section 4. 



3 Levels of Representation in Spatial CLS 

In this section we define an approach to model biological systems at different levels of representation 
using Spatial CLS. We distinguish between a molecular level, in which rewrite rules are used to model 
biochemical reactions among molecules, and one or more visual levels, in which rewrite rules define the 
dynamics of a higher level of organisation of the biological system under analysis. These rewrite rules 
refer to a single level of representation. Therefore, we call them horizontal rules. 

In this paper we consider only the cellular level as a visual level. At visual level the state of the 
system, which is called visual state, is defined using the spatial information in positional terms of the 
system. A visual state describes three kinds of information: 

1. spatial information; 

2. a stage of the system evolution, which we call visual stage; 

3. information on whether that stage has been visualised. 

Example of stages of the system at cellular level are a small cell at the beginning of the growing phase 
or a cell with two nuclei during mitosis. Within a specific stage, state transitions are triggered by rewrite 
rules that modify spatial information and tag the current stage of the system evolution as "visualised". 
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In this way, we can attain visualisation using spatial information. Moreover, visual states represent 
both the biological stage of the system and the status of the visualisation, while rewrite rules control the 
flow of the visualisation. Visualisation can then be used to simulate the behaviour of the system and to 
perform comparison with and prediction of in vivo and in vitro experiments. 

At molecular level we can see biological systems as composed of molecules which are not associated 
with spatial information in our model. The state of the system is represented by the combination of 
molecular populations. State transitions are defined as rewrite rules representing biochemical reactions 
modifying molecular populations. Therefore the molecular level is modelled using a subset of Spatial 
CLS equivalent to Stochastic CLS. 

To model a biological system, we can start from the visual level. The information about the system 
behaviour at this level is purely descriptive. It is based on observation of visible events independently of 
the biochemical processes that cause them. Such visual events are modelled through transitions of visual 
states, whose spatial information is defined in such a way to mimic visible events observed during in vivo 
and in vitro experiments. Molecules are confined within membranes using the looping and containment 
operator. However, such a confinement is logical rather than spatial. Chemical reactions are modelled 
by rules with no visual effect and then linked to the visual level by vertical rules, which control the 
transition of stage in the system evolution, by evaluating conditions at molecular level and checking that 
the current stage has been visualised. 

4 Case Study: Cell Cycle 

In this section we illustrate our approach using the cell cycle of budding yeast as a case study. 
4.1 Visual Level 

Cell cycle consists of four phases: G\ - S - G2 - M. However, at cellular level, only phase S fully 
characterises a visual stage, that is the stage where chromosomes inside the nucleus are replicated. Phase 
G\ incorporates different steps of cell growth, phase G2 does not have any visual counterpart and phase 
M includes one visual stage corresponding to nucleus division. 

In our model we consider only two steps in cell growth: the cell size before and after the growth. 
Therefore we define 4 visual stages: 

1. small cell before growth (beginning of phase Gi); 

2. big cell after growth (end of phase G\); 

3. chromosomes inside the nucleus (end of phase S); 

4. cell with two nuclei (phase M before cytokinesis). 

Based on the above explanation, we define three variables identifying visual stages: 

• cell radius; 

• number of nuclei in a cell; 

• number of chromosomes in a cell nucleus. 

Table 1 shows the values of these variables in each visual stage. 

State transitions are defined using Spatial CLS rewrite rules. We adopt an example similar to the one 
presented by Barbuti, Maggiolo-Schettini, Milazzo and Pardini [3]. We define four rewrite rules to model 
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Stage 


cell radius 


# of nuclei 


# of chromosomes 


avg. time (min) 


1 


3r/4 


single 


single 


40 


2 


r 


single 


single 


30 


3 


r 


single 


double 


25 


4 


r 


double 


double 


5 



Table 1 : The four visual stages of cell-cycle 



the cell cycle. Barbuti, Maggiolo-Schettini, Milazzo and Pardini consider the 24 hour mammalian cell 
cycle [3]. In this paper, we model budding yeast cell cycle whose duration is only about 100 minutes [9]. 
The initial state of the system is defined by the following term: 

(b) L *\ HfoA/],|J ((w)Lj (cr-8m.gB5\cr.gB2.gC2Q)\stage l ) 

The above term represents a sphere with radius R, which contains a cell positioned in its centre. The cell 
contains one nucleus, with 2 chromosomes inside. Each chromosome contains 2 genes, whose function 
will be explained in Section 4.2. The cell is initially in stage 1 (phase G\). At this level the cell cycle is 
defined by the following rules: 

R\ : (m)^ p f ^ 3r J (X | stagei) {mf [pf]r J (X | stagei \ visualisedi) 

Rl - { m )\ p j]. r \ ((")« J (cr.x\cr.y) | stage 2 )°^ (m)f p f] r \ ((n)£j (2cr.x | Icr.y) \ stage 2 \ visualised 2 ) 

Ri '■ ( n ) [(o,o,o),/]^ J ( 2cr - i 1 2cr -y) I sta S e ^ ^ 

( n )[(-^,o,o),/],^ I cr ^ I ( n ) [(J,o,o),/],| J ( cni I cr '^ I stage3 I visualised i 
R 4 : {m)\ p f] r j {{nf u j X \ (n) L y j Y \ stage,) ^ 

( m )[p,/],T J ((")« -I X I ■ Sffl " ?£ ' 4 I visualised *) \ { m )[getposJ],% J (Mm J 7 I I visualised 4) 

In the above rules we only model objects without autonomous movement. Function / on their spatial 
information represents a function that maps from position p to the same position p. Every cell is assumed 
to double its volume during cell cycle. This is shown by rule R\, which represents the growing process 
in phase G\. By changing the cell radius from ^ to r, the volume is nearly doubled. Rule R2 represents 
the chromosomes replication inside nucleus. It is represented by modifying symbol cr that precedes 
each chromosome to 2cr. Rule R3 represents the nucleus division, where the only nucleus inside a cell 
is duplicated into two identical nuclei. To avoid collision between nuclei, pairs of nuclei are moved 
toward opposite directions. Finally rule R4 represents the cytokinesis, which divides the cell and all its 
contents into two daughter cells with the same size and content. This is represented by (1) removing 
the mother cell whose radius is r and having two nuclei, (2) putting a daughter cell with radius ^ 
at mother cell's position, (3) putting a daughter cell with radius ^ at a new position determined by 
getposQ. Symbols st age i,stage2, stagej, stage 4 are used to model the current visual stage of a cell. 
Symbols visualised\,visualised2-,visualisedj,^visualised, decompose each visual stage into two visual 
states, which model whether the current stage has been visualised or not. Therefore, at visual level, 
rewrite rule /?; models the transition between the two visual states that correspond to visual stage i, that 
is from the non-visualised to the visualised state of stage i. 

Although in general rewrite rules at visual level modify spatial information, this is not the case for 
rule R2 because chromosomes are logically positioned within the nucleus, but do not have any quantita- 
tive spatial information associated with them. In fact rule R2 just double the number of chromosomes; 
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choices about where to visualise the duplicated chromosomes are purely aesthetic and are left to the 
implementation of the visualisation. 

The constant associated with each rule defines the rate with which that rule is applied. The rate for 
rule Rj is calculated as the inverse of the average duration for visual stage i. This ensures that the time 
needed for each stage to be visualised mimics the actual time observed in nature. The last column of 
Table 1 shows the average durations of the four stages. 

4.2 Molecular Level and Vertical Rules 

To model cell cycle at molecular level, we adopt the model of budding yeast cell cycle introduced by 
Chen et al. [9]. It is a very detailed model based on differential equations. There is also a variant of this 
model for the eukaryotic cell cycle [11]. Li et al. developed a simpler model of budding yeast cell cycle 
using boolean network [24]. 

Cell cycle is controlled by complexes of cyclin and cyclin-dependent kinases. There are 4 kinds of 
cyclins involved in cell cycle: 

• Cln3, which starts phase G\ ; 

• Cln2, which induce the G\/S transition; 

• Clb5, which controls the phase S; 

• Clb2, which controls the phase M. 

These four cyclins form complexes with Cdc28. 

We define rules that control the state of the system at molecular level. To define a link between 
the visual state (which is controlled by the horizontal rules at visual level) and the biochemical state at 
molecular level, we define 4 vertical rules that cause a transition to next visual stage when a specific 
condition at molecular level is verified. Since these rules do not correspond to any time-consuming 
biochemical process but operate at a meta-level by providing a link between distinct representation levels, 
we define them as instantaneous by using oo as the value for their rates. 

Let condi be the condition at molecular level that triggers a transition from visual stage i to next 
visual stage. We define vertical rule 



Symbol visualisedi is introduced by the application of rule /?,, which marks the completion of the visu- 
alisation of stage i. The completion of the visualisation of the current stage is obviously a precondition 
for the transition to next visual stage. Therefore, symbol visualisedi appears as a precondition in vertical 
rule Tj, which defines the transition to next visual stage next{i). The removal of visualisedi by rule 7} 
enables rule R next i{\ to be applied. 

Vertical rules also allow the introduction and removal of biochemical signals whose accumulation 
or degradation is not sufficiently understood in order to be dealt with at molecular level. The form of 
vertical rules that deal with introduction and removal of biochemical signals is as follows: 



7} : visualisedi\condi\stagei 



CO 



condi\stage next{{) 



where 




• introduction 



, 7} : visualisedi\condi\stagei 



condi | stage next (i) \ signal 
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• removal, 7} : signal\visualisedi\condi\stagei condi\stage next ^ 

We use the following convention for naming objects. Molecule names start with capital letters, 
except in some cases where the molecules could be in two different statuses. For example, we pre- 
fix the molecule names with i to indicate that this molecule is in inactive status. We also prefix the 
molecule names with p when the molecule is in phosphorylated status. Names starting with g are used 
for genes. We use '-' to concatenate two names of molecules, indicating a complex formed by binding 
two molecules. 

Cell cycle starts when a cell grows in phase G\ . This is triggered by a growth factor, which is present 
in the environment, and binds with its receptor in the cell membrane. The resultant complex then triggers 
the production of cyclin Cln3. Cyclin Cln3 (after binding with its kinase partner Cdc28) activates SBF 
and MBF, the transcription factors for cyclins Cln2 and Clb5. Genes gN2 and gB5 control expression of 
cyclin Cln2 and Clb5. Cyclin Cln2 controls the transition between phase G\ and S. Cyclin Clb5 controls 
phase S. At this point some molecules that are not needed in this phase, but will be needed in later phases, 
are temporarily deactivated. In phase G\, Clb5 is deactivated by Sicl. Later Cln2 forms a complex with 
Cdc28 and phosphorylates Sicl, releasing Clb5. Clb2, which is a cyclin needed in mitosis, is bound with 
Sicl in phase G\. Cdcl4, which is needed in mitosis exit, is bound with Netl in phase G\. Cdcl4, which 
is abundant in this phase also activates phosphorylated Sicl, the Clb5 inhibitor. 

Based on their duration, we classify reactions into four categories: very fast, fast, slow and very 
slow. We define four numerical values to characterise reaction rates for the categories above: 20, 5, 1, 
0.25. The higher the rate, the faster the reaction. Obviously, reaction times are many magnitudes smaller 
than durations of visual stages defined in Table 1 . Therefore, we introduce a speeding factor s, which 
defines the ratio between these magnitudes for the specific visual representation we are modelling. The 
actual rate of a reaction is then given by the product between the speeding factor and the numerical value 
corresponding to the category of that reaction. 

To simplify the model, we don't define rules for complexation of cyclins with its kinase partners 
(Cdc28). Phase G\ is therefore modelled as follows: 
Si: GF | (GFR \ Y) L \X 3^ (iGFR \ Y) L \X \ Cln3 

5 2 : Cln3 | iSBF \ iMBF ^ Cln3 \ SBF \ MBF 

5 3 : SBF | (n) L \ (y.gNl.x \ Y) SBF \ {n) L \ {y.gNl.x \ Y) \ Clnl 
S A : MBF | («) L J (y.gB5.x \ Y) ^ MBF \ (n) L \ (y.gB5.x \ Y) \ CW5 

5 5 : Sicl | Clb5 ^ Sicl - Clb5 

5 6 : Netl | CdcU ^Netl- CdcU 

5 7 : pSicl | CdcU Sicl \ CdcU 

Sg : Sicl | Clbl ^ Sicl - Clb2 

S 9 : Clnl | Sicl - Clb5 ^ pSicl \ Clb5 \ Clnl 
The accumulation of Cln2 triggers the transition from phase G\ to S [9]. We introduce the following 
vertical rule: 

Ti : visualisedi\Clnl mc( - an2 ^\stag ei ^ ClnT\stage 2 

to instantaneously perform the transition from stage 1 to stage 2, after rule Ri has visualised the cell 
growth. The function mc(r, i) represents the minimum concentration of r that is needed to trigger the 
transition to stage i. 

Cyclin Clb5 forms a complex with its kinase partner Cdc28, and triggers the duplication of chromo- 
somes. Meanwhile Cln2 is not needed anymore, so it is degraded by SCF. Protein SCF also degrades 
phosphorylated Sicl, enabling Clb5 to optimally work. The cyclin-dependent kinases (Cln2-Cdc28 and 
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Clb5-Cdc28) activate Mcml, which is the transcription factor for Clb2. Expression of Clb2 is controlled 
by gene gB2. Cyclin Clb2 controls phase M. Active Cdhl, whose function is to degrade Clb2, is deac- 
tivated by cyclin-dependent kinases. In this way the degradation of Clb2 is postponed until the end of 
phase M. Finally Clb2 is transcribed and then binds with Cdc28. 
Phase S is therefore modelled as follows: 

Sio : pSicl | SCF ^ SCF 

S n : Cln2\SCF^SCF 

S u : Clnl | Cdhl 3°^ Cln2 \ iCdhl 

Sn : Clb5 \ Cdhl ^ iCdhl \ CW5 

S u : Clb5 | iMcml Mcml \ Clb5 

Sis ■ Mcml | {n) L \ {y.gBl.x \ Y) h^> iMcml \ (n) L \ {y.gBl.x \ Y) \ Clb2 
The accumulation of Clb5 is the event that triggers the transition from phase S to phase G^. We 
introduce the following vertical rule: 

T 2 : visualised 2 \Clb5 mc{clb5 ' i) \stage 2 ^ Clb5 mc{clb5 ^\stage 3 \SPN 

to instantaneously perform the transition from stage 2 to stage 3, after rule R2 has visualised chromosome 
duplication. Besides changing visual stage, the above rule also sends a signal (SPN) to start metaphase 
spindle. This signal is needed to activate Cdcl5 in mitosis. Since it is unknown how to relate the 
accumulation of this signal with biochemical reactions at molecular level, we assume that this signal is 
available since the beginning of phase G 2 . 

The cyclin-dependent kinase (CDK) Clb2-Cdc28 is the main controller of phase G2 and M. This 
CDK activates Mcml, allowing Clb2 to accumulate. It also degrades MBF and SBF, stopping the tran- 
scription of Cln2 and Clb5. Cyclin Cln2 is then degraded by SCF, while the degradation of Clb5 is 
regulated by Cdc20 and APC {Anaphase Promoting Complex). Mcml stimulates gene gC20 to pro- 
duce Cdc20 and APC must be phosphorylated by Clb2-Cdc28 before it can bind with Cdc20. During 
metaphase, the SPN signal activates Cdcl5. Protein Cdcl5 then phosphorylates Netl, releasing Cdcl4. 
Protein Cdcl4 is needed later in mitosis exit and also activates Cdhl . Tumor suppressor Cdhl is needed 
for Clb2 degradation. 

5i 6 : Clbl I iMcml 3°^ Mcml \ CW2 

Sn : Clbl \ MBF ^ Clb2 \ iMBF 

Sn : Clb2 \ SBF ^ Clb2 \ iSBF 

519 : Mcml \ (n) L \ (y.gC20.x \ Y) 3°4 iMcml \ {n) L \ (y.gC20.x \ Y) \ Cdc2Q 

5 20 : Clb2\ APC 3^ APC -P\Clb2 

5 2 1 : APC - P I CdclQ ^ APC - Cdc20 

5 22 : SPN I iCdcl5 SPN \ Cdcl5 

S 23 : Cdcl5 \Netl- CdcU Netl \ CdcU 

5 24 : APC - CdclQ \ CW5 ^ APC 

5 25 : iCdhl \ CdcU Cdhl \ CdcU 

The accumulation of APC — Cdc20 is the event that triggers the beginning of cytokinesis. We intro- 
duce the following vertical rule: 

T 3 : SPN\visualised 3 \APC - C dc20 mc ^ APC - cdc20A '> \stage 3 ^ APC - Cdc2Q mc{APC - cdc2() '%tage A 

to instantaneously perform the transition from stage 3 to stage 4, after rule R3 has visualised nucleus 
division. It also removes signal SPN which is not needed anymore in cytokinesis, but whose degradation 
is not clearly understood in terms of biochemical reactions. 
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The main activity in the stage 4 is the degradation of Clb2 by APC (with the help of CdclO or Cdhl). 
Protein Cdcl4 is also active in this stage, producing Sicl, which is needed to return to phase G\. 
S 2 6- CdcU^Sicl\CdcU 
S 2 i : APC - CdclO I CW2 ^ APC 

S 2S : APC \Cdhl\ Clbl APC \ Cdh 1 
Finally, transition from cytokinesis to phase Gi is triggered by the accumulation of Sicl. We intro- 
duce the following vertical rule: 



to instantaneously perform the transition from stage 4 back to stage 1 , after rule R4 has visualised cell 
division. 

4.3 Algorithms for Simulation and Visualisation 

The model of cell cycle defined using the formal approach introduced in Section 4.1 and Section 4.2 is 
used as the input for a variant of Gillespie's simulation algorithm which extends the variant introduced by 
Basuki, Cerone and Milazzo [4]. The extension consists of the selection of the cell in which the current 
reaction has to occur and the control of the visualisation. 

Step Input M reactions R\ , . . . ,Rm, and ./V values X\,...,X^ representing the initial numbers of each 
of N kinds Si, . . . ,Bn of molecules. Initialise time variable t to 0. Calculate propensities aj, for 
j=l,...,M. Calculate L^Li aj. 

Step 1 If visualisedi is present in the term then visualise stage / and execute vertical rule 7). 

Step 2 If the space is fully occupied then stop simulation. Otherwise generate r\ and calculate x. Incre- 
ment t by X. 

Step 3 Generate r^ and calculate (/x, a). 

Step 4 Execute R^ inside cell a. Update X\ , . . . and a\ , . . . ,cim according to the execution of R^. 

Step 5 Calculate Ylf=\ a j- Return to Step 1. 

Gillespie defines that the probability of reaction R } to occur is proportional to aj, the propensity of 
reaction Rj. The propensity of reaction Rj is the product of its rate cj and the number hj of distinct 
combinations of reacting molecules. 

Since reactions are confined within cells, we need to extend Gillespie's algorithm to choose in which 
cell the chosen reaction R^ should occur. Let C be the number of cells and X l k be the number of molecules 
of kind B k in the i-th cell. We define X k = ^f =l X[. 

Let a'j be the propensity of reaction Rj occurring inside the i-th cell. Then a'j is defined as the 
product of Cj by the number h'j of distinct combinations of reacting molecules of Rj within the i-th cell. 
We define aj = Y4L1 If t is the current simulation time, then t + x represents the time at which next 
reaction occurs, with x exponentially distributed with parameter ao = Y!f=\ a j- Time increment X, the 
index of the reaction that occurs at time t + X and the index a of the cell in which such reaction occurs 
are calculated as follows. 



r 3 : visualised 4 \Sicl m < Sicl ^\stage 4 ^ Sid mciSicl ^\stage 1 




(1) 



H C7-1 H a 

(/X, a) = the integers for which a'j < ^0 < ^^ a 'j 

y=l/=l 7=1 i=l 



(2) 
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where r\ and ri_ are two random real numbers which are uniformly distributed over interval [0,1]. 

We assume the existence of a function, called getposQ, that is responsible to find the correct position 
for a newborn cell and to resolve the spatial conflict that arises between cells. Naturally, the newborn 
cell should be attached to its parent cell. If we model the system in ^-dimensional space, there are 2n 
positions for this newborn cell. Function getposi) will find an empty position among these 2n positions. 
If it cannot find an empty position, it will choose one position and then push forward the other cells on 
that direction by one position. Since the space is limited by a sphere if the last cell is adjacent to the 
sphere boundary, it cannot be pushed forward. In this case getpos{) will search for any empty position 
for this cell. If no more empty position can be found, then the simulation must stop. 

Figure 2 shows an example in 2-dimensional space. In the left picture, cell number 1 is about to 
divide, but all neighboring positions are occupied by other cells. Then in the right picture getpos{) 
chooses the position of cell number 2 as its target and divides cell number 1 into two newborn cells (grey 
colour). It pushes cell number 2 toward cell number 3, but cannot push cell number 3 forward because 
of the boundary. Then cell number 3 is moved to an adjacent empty position, along a different direction. 




Figure 2: Application of getpos{) 

After executing reaction (step 4 of the algorithm), we need to update the molecular populations 
and propensity functions that are affected by application of R^. Gibson and Bruck [17] define a data 
structure to support an extension of Gillespie's First Reaction Method. We use their notion of dependency 
graph in order to simplify the process of updating molecular populations and propensity functions. In 
this way we need to update them only if they are affected by the application of reaction R^ . 

4.4 Visualisation of Cell Cycle 

Figure 3 shows the visualisation of cell cycle in our tool. The picture on the top left shows the initial 
stage of the system, in which the sphere that limits the proliferation space only contains one small cell. 
This cell grows (top centre) and duplicates its chromosomes (top right). Nucleus division (bottom left) 
and cell division (bottom centre) complete the cell cycle. All newly born cells concurrently repeat the 
cell cycle and finally the simulation stops because the space is full of cells (bottom right). 

The tool also allows to model the existence of a virus in the environment where cells proliferate. 
The virus can infect a cell through its membrane and duplicates itself inside the cell. We want to relate 
the virus infection on a cell with the cell's capability to proliferate. We model a virus that synthesises a 
protein able to degrade the growth factor receptor, which is needed in the growing phase of a cell. The 
virus can then move from the cell through the cell membrane back into the environment. From there 
it can then spread to another cell. We can classify the infection level of a cell based on the number of 
viruses inside it. We define a threshold virusTH and use it to classify the level of infection of a cell into 
three classes as follows. 
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Figure 3: Visualisation of cell cycle by our tool 



• healthy cell, if no virus is inside the cell; 

• lightly infected cell, if the number of viruses inside it is less than virusTH; 

• severely infected cell, if the number of viruses inside it is greater than or equal to virusTH. 

To represent the infection level of a cell, we use colours in our visualisation. We colour healthy cells 
with orange, lightly infected cells with green and severely infected cells with blue. Figure 4 shows the 
visualisation of virus attack in our tool. The tool also supports the generation of a report file, describing 
the details of reactions occurring in the system. By combining our observation from the visualisation 
and the report, we can find interesting things about the model. For instance, in the visualisation we 
may observe that a severely infected cell can no longer proliferate, while other infected cells can still 
proliferate. We can later analyse the report generated by the tool and find out that the observed situation 
is related to the stage of the cell at the moment when it is infected by virus. If the cell is infected while 
it is still growing, this may cause the cell to loose its growing capability, which means that the cell can 
no longer proliferate. In contrast if the cell is infected after the growing phase is already over, it can still 
proliferate. 

5 Conclusions and Future Work 

We have defined an approach to model biological systems at different levels of representation: a molec- 
ular level in which rewrite rules are used to model biochemical reactions among molecules, and one or 
more visual levels, in which rewrite rules define the dynamics of a higher level of organisation of the 
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Figure 4: Visualisation of virus attack 



biological system under analysis. We have chosen Spatial CLS to model all representation levels in or- 
der to formally describe, for every visual level, the spatial information needed to quantitatively define all 
visual details that are believed to be essential for an effective visualisation at that level. Visual details 
whose quantitative definition is a purely aesthetic matter are not included in the formal model and their 
quantitative definition is left to the implementation. Each molecular or visual level is defined by an initial 
term and a set of rewrite rules, which we called horizontal rules since they only refer to that level. Each 
visual level is linked to the molecular level by a meta-level of vertical rules, which also cater to lack of 
sufficient knowledge at molecular level. 

We have then presented the budding yeast cell cycle as the case study to illustrate our approach, 
choosing the cellular level as the single visual level. Those visual details for which a quantitative charac- 
terisation is essential for visualisation, such as the position and size of cells, have been formally modelled, 
whereas irrelevant quantitative details, such as the position within the nucleus of duplicated chromo- 
somes, are not. 

We have defined a variant of Gillespie's algorithm that exploits spatial information to choose the 
cell in which next reaction has to occur according to the exponential distribution of reaction time, and 
implemented the algorithm in a tool to illustrate the budding yeast cell cycle case study. In addition 
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to the normal cell cycle, the tool supports the injection of a virus in the environment where the cells 
proliferate. The virus can enter a cell through its membrane, duplicate itself inside the cell, and cause 
the degradation of the growth factor receptor. The visualisation shows that some small infected cells can 
no longer proliferate while big infected cells can still proliferate. Although this specific situation is an 
obvious consequence of the degradation effect the virus has on the growth factor receptor of the cell, it 
actually illustrates that, in general, the visualisation of higher levels of organisation has the potential to 
highlight important behaviours that would not be captured through a formal analysis conducted at the 
molecular level. 

As future work, we plan to extend the work by Basuki, Cerone and Milazzo [4], by implementing 
Spatial CLS within the MAUDE tool and parsing the output generated by MAUDE to provide the ap- 
propriate input to our tool. In this way, simulations performed using MAUDE could be visualised using 
the tool. Moreover, counterexamples generated by the model-checker associated with MAUDE could be 
also visualised. Finally we plan several extensions to the tool implementation, such as 

• the definition of several visual levels, corresponding to different magnifications of the biological 
system, with zooming capabilities to move from consecutive visual levels; 

• the capabilities to save simulations and to re- visualise them at a later stage; 

• the structuring and visualisation of the textual information, currently provided as a single report 
file, by attaching it to the visual objects for which it is relevant. 
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